Note: I'll be very happy to accept suggestions to make this more lucid for the readers to understand. Anyone seeing this, having a suggestion in mind, please leave a message at my

  1. linkedin profile: www.linkedin.com/in/pratik-karmakar-271a89148 or
  2. email : pkpratikkarmakar@gmail.com

Fourier Transform

Motivation:

The motivation behind fourier transform or series is to find out the different frequency components that are constructing a signal and find their amount of contribution to the construction of that particular signal. The signals can be temporal or spatial in nature and both will be converted into frequency domain using fourier transform.

Let's say, we have a function $$f(t)=5\sin(t)+2\sin(2t)+6\sin(3t)$$ We can clearly see that there are 3 sinusoids of 3 different frequencies that are making up the function. The first component has frequency 1, amplitude 5, the second has frequency 2, amplitude 2 and the third has frequency 3 and amplitude 6.

We'll have a look at each of these components individually and the function as a whole:

In the above figures we see the original signal and its components. We see that different frequency components have different amplitudes. This implies that the components have different level of contribution in forming the function. Also, we'll see the difference in contribution of the different components in the power spectrum of this function.

Let's take a big leap and say, every signal can be formed by some (or infinitely many) sinusoids of various amplitudes and phases. Seems absurd, right? Let's have a look at some very familiar functions.

  1. Rectangular wave function
  2. Triangular wave function

(Both are periodic functions)

1. Rectangular wave function

Shown below is a rectangular wave of period 2s:

Now let's try to form this function with sinusoids:

2. Triangular wave function (or Sawtooth wave)

Shown below is a triangular wave (sawtooth wave) of period 2s

We see that the above plots show how we are nearing the actual wave functions by adding up different sinusoids of varying frequency and amplitude. Let us see how this approximation works with increase in number of sinusoids (or harmonics) taken in consideration:

Change in values of input to the triangapp() and rectapp() functions is showing us clearly that more the sinusoidal components we take into consideration, better is the approximation of the waveforms. This gives us an intuition that if we take infinitely many sinusoids (theoretically), we'll get the exact waveform.

Moving on further:

So far we have seen the intuition behind the Fourier transform. Let's get our hands into the mathematics involved in it: Now, we have two terms at our hand, namely Fourier Series and Fourier Transform. Loosely speaking, Fourier Series finds out the components of periodic signals, whereas Fourier Transform is used for obtaining the components of non periodic signals (with finite area under the curve).

Say, $x(t)$ is a continuous non periodic signal and its Fourier transform is represented by $X(\omega)$, then $$X(\omega)=\int_{-\infty}^{+\infty}x(t)e^{-j\omega t}dt$$ where $j=\sqrt{-1}$,

And if say, z(t) is a periodic function, we get Fourier series of z(t), and the Fourier coefficients, $c_n$ are obtained as $$c_n=(1/T)\int_Tz(t)e^{-jn\omega_0t}dt=(1/T)\int_{-T/2}^{+T/2}z(t)e^{-jn\omega_0t}dt$$

and $$z(t)=\sum_{n=-\infty}^{+\infty}c_ne^{jn\omega_0t}$$

Signals and dealing with them:

Signals can be anything that carries some information. A sound wave, a photograph, a video, an electromagnetic wave etc. are examples of signals. Difference is in their domain mostly. The domains of these signals can be temporal (time varying signals), spatial (signals varying with position) or both temporal and spatial (eg. videos).

All these signals can be converted into frequency domain using Fourier transform or Fourier series. We'll mostly see the use of Fourier transform as signals that we mostly deal with are non periodic in nature. Moreover, as we're mostly interested in processing those signals and extracting features from them, and this part will be done using our computers, thus the signals are digital in nature, i.e., discrete and quantised.

Having said this, we eliminate the use of continuous Fourier transform in our case, we'll be using Discrete Fourier Transform (aka DFT) mostly.

Say, we have two dimensions $x$ and $y$ and a function $f(x,y)$, then its discrete fourier transform is given by

$$F(u,v)=\sum_{m=-\infty}^{+\infty}\sum_{n=-\infty}^{+\infty}f(m,n)e^{-j2\pi(umx_0+vny_0)}$$$$f(m,n)=(1/UV)\int_0^U\int_0^VF(u,v)e^{j2\pi(umx_0+vny_0)}dudv$$

where, $x_0$ and $y_0$ are the intervals between consecutive samples along $x$ and $y$ axes.

Now, say we're working with an image, then the signal for us will be the pixel values which are varying with position and thus an image is a spatial discrete signal. An image is of finite dimension and thus the limits of m and n will not be $-\infty$ to $\infty$ anymore, rather will be 0 to (M-1) and 0 to (N-1) if the dimension of the image is MxN.

Seeing these expressions one must be wondering how these are helping us in finding the constituent frquency components of the function. Let's take a look at the quantity $e^{-j\omega_0t}$. Well, at a glance this too looks difficult to understand and interpret, so let's scale things down and have a look at $e^{jt}$: On the Argand plane this can be seen as a vector of unit length rotating with angular frequency 1 rad/s in the anti-clockwise direction. These rotating vectors are named phasors in mathematics. So this tells us that $e^{-j\omega_0t}$ is a phasor of length 1 unit rotating with frequency $\omega_0$ in the clockwise direction. This phasor is basically a 2D projection of the helix that $e^{-j\omega_0t}$ is representing.

Sounds line a big jump! Where does this helix come from? Let's have a closer look: $$e^{jt}=cos(t)+jsin(t)$$ Let's take $x=cos(t)$(the real axis) and $y=sin(t)$(the imaginary axis) and $z=t$, then we'll get a helix that is the locus of the point moving forward in $z$ direction with time, while making circular loops perpendicular to the $z$ direction.

The above figures are to help visualise the term $e^{jt}$, the last circular figure is what we see on an argand plane as the locus of the phasor that we talked about. The projections on the $x-z$ and $y-z$ planes show the two constituent components namely the real one, $cos(t)$ and the imaginary one, $sin(t)$. So, this two sinusoids (phased out by $90^o$ are forming this Cisoid.)

This visualisation should clearly imply that $e^{-jt}$ is a helix, rotating in just opposite direction.

Moving on from $e^{-jt}$ to $e^{-j\omega t}$, with increase in $\omega$ we see a decrease the pitch(distance travelled along z in one time period) of the helix, which also means that the phasor is rotating faster now in $x-y$ plane.

Explanation: if $\omega T$=$2\pi$ (angle swept to complete once cycle), then $T=2\pi/\omega$, i.e the phasor completes one rotation in $2\pi/\omega$ seconds.Thus in one second it completes $1/T$ cycles. So, cycles per second, $f=1/T=\omega/{2\pi}$ Hz.

So, $\omega$ is basically the frequency of rotation of the phasor (radians/sec), in Hertz that would be represented as $f=\omega/{2\pi}$ Hz or cycles per second(cps).

Before moving on further with intuitions, we need to explore some more mathematical details:

  1. Visualizing the sinusoids as sum of two phasors rotating in opposite directions.
  2. Having an idea about the Dirac Delta function.
  1. As $$e^{j\omega t}=cos(\omega t)+jsin(\omega t)$$ and $$e^{-j\omega t}=cos(\omega t)-jsin(\omega t)$$ we can write $$sin(\omega t)=(e^{j\omega t}-e^{-j\omega t})/2j$$ and $$cos(\omega t)=(e^{j\omega t}+e^{-j\omega t})/2$$ that is both are sums of two phasors rotating in opposite directions (as we have seen the interpretations of $e^{j\omega t}$ and $e^{-j\omega t}$).
  1. The Dirac Delta function (aka Impulse function) or $\delta(t)$ is defined as a function that gives infinite value where the input is zero, else zero. And$$\int_{-\infty}^{+\infty}\delta(t)dt=1$$ Now, this is approximated in different ways. One is given by $$\delta(t)=lim_{a->\infty}sin(at)/{\pi t}=lim_{a->\infty}asin(at)/{\pi at}=lim_{a->\infty}asinc(at)$$ where $$sinc(t)=sin(t)/{\pi t}$$

Let's have a look at how $sinc$ approximation works for the impulse function:

This shows how we near the $\delta(t)$ function with increasing value of $a$.

Why are we discussing this function? Here's the mathematical clarification:

$$\int_{-\infty}^{+\infty}sin(\omega_0 t)e^{-j\omega t}dt=j\pi(\delta(\omega+\omega_0)-\delta(\omega-\omega_0))$$

of which if we plot the magnitude, looks like as shown below (when $\omega_0=5 rad/s$):

This shows two peaks at $5 rad/s$ and $-5 rad/s$. The negative value indicates the existence of a phasor rotating in clockwise direction, while the positive one indicates the existence of a phasor rotating in anti-clockwise direction, i.e. we are getting the frequencies of the constituent phasors of $sin(5t)$.

And this tells us why and how we are getting the constituent frequencies of a function $x(t)$ when we perform $$\int_{-\infty}^{+\infty}x(t)e^{-j\omega t}dt$$ This is because we have already seen that any $x(t)$ (signal or function) can be expressed as a sum of some (or infinitely many) sinusoids, and the operation gives us peaks at the frequencies of the constituent sinusoids.

Getting back to our initial sample function $f(t)=5\sin(t)+2\sin(2t)+6\sin(3t)$, we'll get something like the figure shown below when it is examined in frequency domain:

So in frequency domain we get peaks at $\omega$=$-3, -2, -1, 1, 2, 3$ $rad/s$, which is pretty obvious to us by now!

So this clearly shows how we recognise the frequencies of the constituent sinusoids with the operation $$\int_{-\infty}^{+\infty}x(t)e^{-j\omega t}dt$$ Now, we need to talk about how the phase information is extracted from the same thing:

As $e^{-j\omega t}=cos(\omega t)-jsin(\omega t)$, whenever we multiply this with $sin(\omega_0t)$ we get $sin(\omega_0t)cos(\omega t)-jsin(\omega_0t)sin(\omega t)$. Now whatever be the $\omega$. the first part on integrating becomes 0, the second part (i.e. the imaginary part) remains though and gives a peak at $\omega=\omega_0$.

And when multiplied by $cos(\omega_0t)$, the real part gives a peak at $\omega=\omega_0$ and the imaginary part vanishes.

Now, say we have a signal $x(t)=acos(\omega_0t)+bsin(\omega_0t)$ (notice, this is one frequency component only), which can be written as $\sqrt{a^2+b^2}cos(\omega_0t-\phi)$, where $\phi=tan^{-1}(b/a)$. And we call this $\phi$ the phase of the signal, and $\sqrt{a^2+b^2}$ the magnitude. ($(b/\sqrt{a^2+b^2})=sin(\phi)$ and ($a/\sqrt{a^2+b^2})=cos(\phi)$)

Now multiplying this with $e^{-j\omega t}$ and integrating gives us a real component corresponding to the $acos(\omega_0 t)$ term and an imaginary component corresponding to the $bsin(\omega_0t)$ term. Thus we get the phase using $tan^{-1}(b/a)$.

Lets have a look at the images below:

This clearly shows that the cosine part of a signal has no component along the imaginary axis and the sine part has no component along the real axis. Let's say we have a cosine component with phase $\pi/6$: $$cos(\omega_0t+\pi/6)=cos(\omega_0t)cos(\pi/6)-sin(\omega_0t)sin(\pi/6)$$

The importance: Interplay of domains

Our signal as we said can be temporal or spatial or a combination of both. In all the cases we can and need to analyse and play with their frequency characteristics. This interplay of domains is facilitated by Fourier transform. Here lies the importance of this enormously powerful tool.

Fourier transform on images

An image is made up of pixels having different values distributed in 1 or 3 (grayscale or RGB) channels. So, images can be interpreted as 2D digital signals and we can find out the frequencies presented in an image using FFT. (Here obviously we use the Discrete Fourier Transform)

As the images above show, one image is a rectangular wave along the Y axis, the other is a rectangular wave along the X axis. And we have seen how a rectangular wave is made up of different frequency components. The difference is just that, here the domain is spatial instead of temporal.

The two figures above show that the first image has different frequency components along the Y axis but nothing along the X axis and vice versa for the second one. The result seems pretty obvious, right? In the first image, we had no variation of pixel values along the X axis, which means, the only frequency component along X must be the zero frequency component (or the DC component). The axes interchange for the second image.

Now these images above show, one image is a rectangular wave along the Y=X line, the other is a rectangular wave along the Y=-X line. (Well, these are not perfect rectangular waves though) So, let's have a look at how they look in frequency domain:

The images above show how the distribution in frequency domain changes with the alignment of edges in an image. This should give us a clear idea about how we get into frequency domain from pixel values of an image.

Here we wrap up this notebook with the basic understanding and motivation of Fourier Transform.

Would be happy to dive deeper into this.